Improving cache locality for GPU-based volume rendering
نویسندگان
چکیده
We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects the width and height of thread blocks (TBs) so that each warp, which is a series of 32 threads processed simultaneously, can minimize memory access strides. We also incorporate transposed indexing of threads to perform TB-level cache optimization for specific viewpoints. Furthermore, we maximize TB size to exploit spatial locality with fewer resident TBs. For viewpoints with relatively large strides, we synchronize threads of the same TB at regular intervals to realize synchronous ray propagation. Experimental results indicate that our cache-aware method doubles the worst rendering performance compared to those provided by the CUDA and OpenCL software development kits.
منابع مشابه
Improving Cache Locality for Ray Casting with CUDA
In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of thread blocks (TBs) such th...
متن کاملImproving Inter-thread Data Sharing with GPU Caches
The massive amount of fine-grained parallelism exposed by a GPU program makes it difficult to exploit shared cache benefits even there is good program locality. The non deterministic feature of thread execution in the bulk synchronize parallel (BSP) model makes the situation even worse. Most prior work in exploiting GPU cache sharing focuses on regular applications that have linear memory acces...
متن کاملShear-rotation-warp volume rendering
Shear–warp volume rendering has the advantages of a moderate image quality and a fast rendering speed. However, in the case of dynamic changes in the opacity transfer function, the efficiency of memory access drops, as the method cannot exploit pre-classified volumes. In this paper, we propose an efficient algorithm that exploits the spatial locality of memory references for interactive classif...
متن کاملZippy: A Framework for Computation and Visualization on a GPU Cluster
Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general-purpose computation and visualization applications. However, the programming model for high performance generalpurpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy framework, a general and scalable solution to this problem. It abstracts the GPU ...
متن کاملGigaVoxels: A Voxel-Based Rendering Pipeline For Efficient Exploration Of Large And Detailed Scenes
In this thesis, we present a new approach to efficiently render large scenes and detailed objects in realtime. Our approach is based on a new volumetric pre-filtered geometry representation and an associated voxel-based approximate cone tracing that allows an accurate and high performance rendering with high quality filtering of highly detailed geometry. In order to bring this voxel representat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 40 شماره
صفحات -
تاریخ انتشار 2014